Statistical models for human body pose estimation from videos
نویسنده
چکیده
To investigate the task of multidimensional continuous inference from video sequences on a concrete example application, we focus on the problem of articulated 3D human tracking from monocular video. This is an interesting topic because of its relevance for biological vision systems, as well as its many applications in various domains. Estimating body pose and motion of humans is a challenging task, with difficulties such as self-occlusions and ambiguities. To account for unresolvable uncertainties of the visual analysis of such footage, we formulate the task as a probabilistic inference problem. The pose estimation and tracking algorithms are based on statistical models that can be automatically learned from a set of example data. Thanks to this architecture, the proposed approaches remain general and can be tailored to a specific task by the choice of training data sets. Prior knowledge can be provided in a flexible and theoretically well-motivated way. First, we propose an approach that is based on a model of the joint probability distribution of body pose and the corresponding human shape, as it can be observed in video images. Both body pose and shape are treated as multivariate random variables, by choosing suitable representations. The statistical model uses a mixture of Gaussian distributions to approximate the density, which enables efficient discriminative inference of body poses from shape descriptors. When additionally taking the unknown image locations of the persons into account, the posterior distributions become non-parametric. Therefore, a hybrid inference scheme based on a Rao-Blackwellised particle filter combines parametric inference with sample based inference. A second approach is based on a generative predictive model of human shape, using nonlinear regression. To enable efficient learning and sample based inference, a low-dimensional embedding of human locomotion is determined, with a nonlinear dynamical model. This method is implemented using Locally Linear Embedding, and Relevance Vector Machines for sparse nonlinear regression. We also propose an integrated formulation of the model, fully based on Gaussian Process regression. The resulting tracking algorithms are tested on realistic video sequences with low resolution and image noise. We present extensions of the framework, for simultaneously tracking multiple persons that occlude each other, and for recognising the performed activity along with the pose estimation.
منابع مشابه
Robust Statistical Approach for Extraction of Moving Human Silhouettes from Videos
Human pose estimation is one of the key problems in computer visionthat has been studied in the recent years. The significance of human pose estimation is in the higher level tasks of understanding human actions applications such as recognition of anomalous actions present in videos and many other related applications. The human poses can be estimated by extracting silhouettes of humans as silh...
متن کاملA Framework for Human Pose Estimation in Videos
In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. We aim to demonstrate that by using temporal information, the human pose estimation results can be improved over image based pose estimation methods. In contrast to the commonly employed graph optimization formulation, which is NP-hard and needs approximate solutions, we formulate this problem into...
متن کاملPose Estimation and Tracking of Eating Persons in Real-life Settings
We present an approach to estimate and track 2D upper body poses of persons who are having a meal in videos with highly challenging uncontrolled imaging conditions. We employ a probabilistic model that represents the body as a kinematic tree, and perform inference in this kinematic tree model using particle ltering, and also estimates self-occlusions. Our approach is evaluated with 7 di erent v...
متن کاملتخمین چنددوربینی حالت سه بعدی انسان با برازش افکنش مدل اسکلت سه بعدی مفصل دار در تصاویر سایه نما
Automatic capture and analysis of human motion, based on images or video is important issue in computer vision due to the vast number of applications in animation, surveillance, biomechanics, Human Computer Interaction, entertainment and game industry. In these applications, it is clear that 3D human pose estimation is an essential part. Therefore, its accuracy has a great effect on the perform...
متن کاملPose Estimation of Players in Hockey Videos using Convolutional Neural Networks
Traditional hockey scouting procedures for evaluating player performance is based on visual monitoring of hockey videos and statistics. However, that evaluation is time consuming and prone to human bias. In addition, current research within hockey analytics quantifies player performances by employing statistical models on common hockey statistics. To improve statistical models and increase the ...
متن کامل